Active Learning of Input Grammars
نویسندگان
چکیده
Knowing the precise format of a program’s input is a necessary prerequisite for systematic testing. Given a program and a small set of sample inputs, we (1) track the data flow of inputs to aggregate input fragments that share the same data flow through program execution into lexical and syntactic entities; (2) assign these entities names that are based on the associated variable and function identifiers; and (3) systematically generalize production rules by means of membership queries. As a result, we need only a minimal set of sample inputs to obtain human-readable context-free grammars that reflect valid input structure. In our evaluation on inputs like URLs, spreadsheets, or configuration files, our AUTOGRAM prototype obtains input grammars that are both accurate and very readable—and that can be directly fed into test generators for comprehensive automated testing.
منابع مشابه
D.Béchet A.Foret
This paper investigates the learnability by positive examples in the sense of Gold of Pregroup Grammars. In a first part, Pregroup Grammars are presented and a new parsing strategy is proposed. Then, theoretical learnability and non-learnability results for subclasses of Pregroup Grammars are proved. In the last two parts, we focus on learning Pregroup Grammars from a special kind of input call...
متن کاملLearning Node Replacement Graph Grammars in Metabolic Pathways
This paper describes graph-based relational, unsupervised learning algorithm to infer node replacement graph grammar and its application to metabolic pathways. We search for frequent subgraphs and then check for overlap among the instances of the subgraphs in the input graph. If subgraphs overlap by one node, we propose a node replacement graph grammar production. We also can infer a hierarchy ...
متن کاملLearning Multiple Languages in Groups
We consider a variant of Gold’s learning paradigm where a learner receives as input n different languages (in form of one text where all input languages are interleaved). Our goal is to explore the situation when a more “coarse” classification of input languages is possible, whereas more refined classification is not. More specifically, we answer the following question: under which conditions, ...
متن کاملRapidly Deploying Grammar-Based Speech Applications with Active Learning and Back-off Grammars
Grammar-based approaches to spoken language understanding are utilized to a great extent in industry, particularly when developers are confronted with data sparsity. In order to ensure wide grammar coverage, developers typically modify their grammars in an iterative process of deploying the application, collecting and transcribing user utterances, and adjusting the grammar. In this paper, we ex...
متن کاملPartial Learning Using Link Grammars Data
Kanazawa has shown that several non-trivial classes of categorial grammars are learnable in Gold’s model. We propose in this article to adapt this kind of symbolic learning to natural languages. In order to compensate the combinatorial explosion of the learning algorithm, we suppose that a small part of the grammar to be learned is given as input. That is why we need some initial data to test t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1708.08731 شماره
صفحات -
تاریخ انتشار 2017